Differences in HDDM parameter reliability for t1 data using either n=552 or n=150 were in a separate report on T1 HDDM parameters. No meaningful differences were found between these two sample sizes.
But even 150 is a large sample size for psychological studies, especially forced choice reaction time tasks that are included in this report. Here we look at how the reliability for raw and ddm measures change for sample sizes that are more common in studies using these tasks (25, 50, 75, 100, 125, 150)
Note: Not refitting HDDM’s for each of these sample sizes since a. there were no differences in parameter stability for n=150 vs 552 and b. a more comprehensive comparison using non-hierarchical estimates and model fit indices will follow. [Should I revisit this? - 150 and 552 might be too large to lead to changes in parameter estimates but smaller samples that are more common in psych studies might sway estimates more. If this were the case then wouldn’t we expect the comparison of non-hierarchical vs hierarchical estimates to be the largest? If there is no difference then we don’t have to worry about it?]
Note: Some variables do not have enough variance to calculate reliability for difference sample sizes. These variables are:
>stroop.post_error_slowing
>simon.std_rt_error
>shape_matching.post_error_slowing
>directed_forgetting.post_error_slowing
>choice_reaction_time.post_error_slowing
>choice_reaction_time.std_rt_error
>dot_pattern_expectancy.post_error_slowing
>motor_selective_stop_signal.go_rt_std_error
>motor_selective_stop_signal.go_rt_error
>attention_network_task.post_error_slowing
>recent_probes.post_error_slowing
>simon.post_error_slowing
>dot_pattern_expectancy.BY_errors
source('/Users/zeynepenkavi/Dropbox/PoldrackLab/SRO_DDM_Analyses/code/workspace_scripts/ddm_reldf_sample_size.R')
Warning: Column `dv` joining factor and character vector, coercing into
character vector
Does the mean reliability change with sample size?
Yes. The larger the sample size the more reliable is a given measure on average. The largest increase in reliability is when shifting from 25 to 50 subjects. This is important because many studies using these measures have sample sizes <50 per group.
fig_name = 'rel_by_samplesize.jpeg'
knitr::include_graphics(paste0(fig_path, fig_name))
When <15 subjects are used to calculate the measures they are significantly less reliable.
summary(lmer(icc ~ factor(sample_size) + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ factor(sample_size) + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2823478
Scaled residuals:
Min 1Q Median 3Q Max
-605.6 0.0 0.0 0.0 0.4
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.06528 0.2555
iteration (Intercept) 0.00427 0.0653
Residual 65.64367 8.1021
Number of obs: 402034, groups: dv, 505; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.1251 0.0386 3.24
factor(sample_size)15 0.2470 0.0513 4.81
factor(sample_size)20 0.2780 0.0513 5.42
factor(sample_size)25 0.2947 0.0512 5.76
factor(sample_size)50 0.3222 0.0512 6.30
factor(sample_size)75 0.3303 0.0512 6.45
factor(sample_size)100 0.3339 0.0512 6.53
factor(sample_size)125 0.3353 0.0512 6.55
Correlation of Fixed Effects:
(Intr) f(_)15 f(_)20 f(_)25 f(_)50 f(_)75 f(_)10
fctr(sm_)15 -0.665
fctr(sm_)20 -0.665 0.500
fctr(sm_)25 -0.667 0.502 0.502
fctr(sm_)50 -0.667 0.502 0.502 0.504
fctr(sm_)75 -0.667 0.502 0.502 0.504 0.504
fctr(s_)100 -0.667 0.502 0.502 0.504 0.504 0.504
fctr(s_)125 -0.667 0.502 0.502 0.504 0.504 0.504 0.504
Are there differences between any other sample sizes? This ignores the differences between variables but there seems to be only differences between n=10 and all other larger sample size.
with(rel_df_sample_size_summary, pairwise.t.test(mean_icc, sample_size, p.adjust.method = "bonferroni"))
Pairwise comparisons using t tests with pooled SD
data: mean_icc and sample_size
10 15 20 25 50 75 100
15 2e-04 - - - - - -
20 9e-06 1 - - - - -
25 2e-06 1 1 - - - -
50 9e-08 1 1 1 - - -
75 4e-08 1 1 1 1 - -
100 3e-08 1 1 1 1 1 -
125 2e-08 1 1 1 1 1 1
P value adjustment method: bonferroni
Does the change in reliabiliity with sample size vary by variable type?
No. The changes do not differ by raw vs. ddm measures or for contrast and condition measures compared to non-contrast measures. Contrast and condition measures are just less reliable overall.
summary(lmer(icc ~ sample_size * ddm_raw + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: icc ~ sample_size * ddm_raw + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2805433
Scaled residuals:
Min 1Q Median 3Q Max
-603.4 0.0 0.0 0.0 0.4
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.06500 0.2550
iteration (Intercept) 0.00433 0.0658
Residual 66.14286 8.1328
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.349578 0.030865 11.33
sample_size 0.001260 0.000400 3.15
ddm_rawraw -0.105882 0.049825 -2.13
sample_size:ddm_rawraw 0.000831 0.000662 1.26
Correlation of Fixed Effects:
(Intr) smpl_s ddm_rw
sample_size -0.681
ddm_rawraw -0.591 0.422
smpl_sz:dd_ 0.412 -0.605 -0.697
summary(lmer(icc ~ sample_size * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula:
icc ~ sample_size * overall_difference + (1 | dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2805380
Scaled residuals:
Min 1Q Median 3Q Max
-603.4 0.0 0.0 0.0 0.3
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.04637 0.2153
iteration (Intercept) 0.00433 0.0658
Residual 66.14275 8.1328
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.518096 0.044356 11.68
sample_size 0.000621 0.000602 1.03
overall_differencecontrast -0.459839 0.065919 -6.98
overall_differencecondition -0.211001 0.054844 -3.85
sample_size:overall_differencecontrast 0.001230 0.000905 1.36
sample_size:overall_differencecondition 0.001343 0.000753 1.78
Correlation of Fixed Effects:
(Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size -0.713
ovrll_dffrnccnt -0.658 0.480
ovrll_dffrnccnd -0.791 0.577 0.532
smpl_sz:vrll_dffrnccnt 0.475 -0.665 -0.721 -0.384
smpl_sz:vrll_dffrnccnd 0.571 -0.800 -0.384 -0.721
smpl_sz:vrll_dffrnccnt
sample_size
ovrll_dffrnccnt
ovrll_dffrnccnd
smpl_sz:vrll_dffrnccnt
smpl_sz:vrll_dffrnccnd 0.532
Does variability of reliability change with sample size?
Trending but not significant. The SEMs are always pretty small.
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), sem_icc))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
facet_wrap(~overall_difference)+
ylab("Standard error of mean of reliability \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")+
ylim(0,0.3)
Warning: Removed 16 rows containing missing values (geom_path).
summary(lmer(sem_icc ~ sample_size * overall_difference + (1|dv), rel_df_sample_size_summary))
Linear mixed model fit by REML ['lmerMod']
Formula: sem_icc ~ sample_size * overall_difference + (1 | dv)
Data: rel_df_sample_size_summary
REML criterion at convergence: 9708
Scaled residuals:
Min 1Q Median 3Q Max
-0.12 -0.06 -0.02 0.01 60.38
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 0.000213 0.0146
Residual 0.657690 0.8110
Number of obs: 3992, groups: dv, 499
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.038630 0.039761 0.97
sample_size -0.000334 0.000600 -0.56
overall_differencecontrast 0.048118 0.059791 0.80
overall_differencecondition 0.090710 0.049734 1.82
sample_size:overall_differencecontrast -0.000466 0.000902 -0.52
sample_size:overall_differencecondition -0.000986 0.000750 -1.31
Correlation of Fixed Effects:
(Intr) smpl_s ovrll_dffrnccnt ovrll_dffrnccnd
sample_size -0.792
ovrll_dffrnccnt -0.665 0.527
ovrll_dffrnccnd -0.799 0.633 0.532
smpl_sz:vrll_dffrnccnt 0.527 -0.665 -0.792 -0.421
smpl_sz:vrll_dffrnccnd 0.633 -0.799 -0.421 -0.792
smpl_sz:vrll_dffrnccnt
sample_size
ovrll_dffrnccnt
ovrll_dffrnccnd
smpl_sz:vrll_dffrnccnt
smpl_sz:vrll_dffrnccnd 0.532
Does between subjects variance change with sample size?
Yes. Between subjects variance decreases with sample size. This is more pronounced for non-contrast measures.
This goes against my intuitions. Looking at the change in between subjects percentage of individual measures’ there seems to be a lot of inter-measure variance (more pronounced below for within subject variance). I’m not sure if there is something in common for the measures that show increasing between subjects variability with sample size and that separates them from those that show decreasing between subjects variability with sample size (the slight majority).
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_subs_pct = mean(mean_var_subs_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_subs_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_subs_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of \n between subjects variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_subs_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_subs_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 3274121
Scaled residuals:
Min 1Q Median 3Q Max
-4.931 -0.705 0.054 0.704 4.571
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 109.05 10.44
iteration (Intercept) 1.26 1.12
Residual 212.49 14.58
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 57.911 0.898
factor(sample_size)15 -0.139 0.175
factor(sample_size)20 -0.467 0.175
factor(sample_size)25 -0.964 0.175
factor(sample_size)50 -3.428 0.174
factor(sample_size)75 -5.652 0.174
factor(sample_size)100 -7.458 0.174
factor(sample_size)125 -9.173 0.174
overall_differencecontrast -14.208 1.340
overall_differencecondition -5.099 1.115
factor(sample_size)15:overall_differencecontrast 0.298 0.262
factor(sample_size)20:overall_differencecontrast 0.611 0.262
factor(sample_size)25:overall_differencecontrast 1.143 0.262
factor(sample_size)50:overall_differencecontrast 3.440 0.262
factor(sample_size)75:overall_differencecontrast 5.385 0.262
factor(sample_size)100:overall_differencecontrast 6.711 0.262
factor(sample_size)125:overall_differencecontrast 8.236 0.262
factor(sample_size)15:overall_differencecondition 0.219 0.218
factor(sample_size)20:overall_differencecondition 0.607 0.218
factor(sample_size)25:overall_differencecondition 0.897 0.218
factor(sample_size)50:overall_differencecondition 1.993 0.218
factor(sample_size)75:overall_differencecondition 2.978 0.218
factor(sample_size)100:overall_differencecondition 3.532 0.218
factor(sample_size)125:overall_differencecondition 4.155 0.218
t value
(Intercept) 64.47
factor(sample_size)15 -0.80
factor(sample_size)20 -2.68
factor(sample_size)25 -5.53
factor(sample_size)50 -19.64
factor(sample_size)75 -32.39
factor(sample_size)100 -42.74
factor(sample_size)125 -52.57
overall_differencecontrast -10.60
overall_differencecondition -4.57
factor(sample_size)15:overall_differencecontrast 1.14
factor(sample_size)20:overall_differencecontrast 2.33
factor(sample_size)25:overall_differencecontrast 4.36
factor(sample_size)50:overall_differencecontrast 13.12
factor(sample_size)75:overall_differencecontrast 20.54
factor(sample_size)100:overall_differencecontrast 25.60
factor(sample_size)125:overall_differencecontrast 31.41
factor(sample_size)15:overall_differencecondition 1.00
factor(sample_size)20:overall_differencecondition 2.78
factor(sample_size)25:overall_differencecondition 4.11
factor(sample_size)50:overall_differencecondition 9.14
factor(sample_size)75:overall_differencecondition 13.65
factor(sample_size)100:overall_differencecondition 16.19
factor(sample_size)125:overall_differencecondition 19.05
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Does within subjects variance change with sample size?
Yes. Within subject variance increses with sample size. This again goes against my intuition but here the inter-meausre differences are even more pronounced. There appears to be some measures for which the change in two measurements at different time points is larger the more subjects are tested and those that show a smaller decrease in within subject variance with larger sample sizes. I still don’t know if these two types of measures have anything that distinguishes them.
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_ind_pct = mean(mean_var_ind_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_ind_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_ind_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of \n within subjects variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_ind_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_ind_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 3471555
Scaled residuals:
Min 1Q Median 3Q Max
-4.551 -0.750 -0.171 0.698 4.326
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 132.28 11.50
iteration (Intercept) 1.41 1.19
Residual 348.68 18.67
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 19.234 0.992
factor(sample_size)15 0.504 0.224
factor(sample_size)20 1.149 0.224
factor(sample_size)25 1.844 0.224
factor(sample_size)50 5.282 0.224
factor(sample_size)75 8.154 0.224
factor(sample_size)100 10.486 0.224
factor(sample_size)125 12.721 0.224
overall_differencecontrast 4.621 1.481
overall_differencecondition 1.928 1.232
factor(sample_size)15:overall_differencecontrast -0.606 0.336
factor(sample_size)20:overall_differencecontrast -1.230 0.336
factor(sample_size)25:overall_differencecontrast -1.885 0.336
factor(sample_size)50:overall_differencecontrast -4.844 0.336
factor(sample_size)75:overall_differencecontrast -7.225 0.336
factor(sample_size)100:overall_differencecontrast -8.779 0.336
factor(sample_size)125:overall_differencecontrast -10.817 0.336
factor(sample_size)15:overall_differencecondition -0.297 0.280
factor(sample_size)20:overall_differencecondition -0.634 0.280
factor(sample_size)25:overall_differencecondition -0.928 0.279
factor(sample_size)50:overall_differencecondition -2.201 0.279
factor(sample_size)75:overall_differencecondition -3.192 0.279
factor(sample_size)100:overall_differencecondition -3.630 0.279
factor(sample_size)125:overall_differencecondition -4.312 0.279
t value
(Intercept) 19.39
factor(sample_size)15 2.25
factor(sample_size)20 5.14
factor(sample_size)25 8.25
factor(sample_size)50 23.63
factor(sample_size)75 36.48
factor(sample_size)100 46.91
factor(sample_size)125 56.91
overall_differencecontrast 3.12
overall_differencecondition 1.57
factor(sample_size)15:overall_differencecontrast -1.80
factor(sample_size)20:overall_differencecontrast -3.66
factor(sample_size)25:overall_differencecontrast -5.61
factor(sample_size)50:overall_differencecontrast -14.42
factor(sample_size)75:overall_differencecontrast -21.51
factor(sample_size)100:overall_differencecontrast -26.14
factor(sample_size)125:overall_differencecontrast -32.21
factor(sample_size)15:overall_differencecondition -1.06
factor(sample_size)20:overall_differencecondition -2.27
factor(sample_size)25:overall_differencecondition -3.32
factor(sample_size)50:overall_differencecondition -7.88
factor(sample_size)75:overall_differencecondition -11.42
factor(sample_size)100:overall_differencecondition -12.99
factor(sample_size)125:overall_differencecondition -15.43
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Does residual variance change with sample size?
tmp = rel_df_sample_size_summary %>%
na.exclude()%>%
group_by(overall_difference, sample_size, ddm_raw) %>%
summarise(mean_var_resid_pct = mean(mean_var_resid_pct, na.rm=T))
rel_df_sample_size_summary %>%
na.exclude() %>%
ggplot(aes(factor(sample_size), mean_var_resid_pct))+
geom_line(aes(group = dv, color=ddm_raw), alpha = 0.1)+
geom_line(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw, group=ddm_raw))+
geom_point(data = tmp, aes(factor(sample_size),mean_var_resid_pct, color=ddm_raw))+
facet_wrap(~overall_difference)+
ylab("Mean percentage of residual variance \n of 100 samples of size n")+
xlab("Sample size")+
theme(legend.title = element_blank(),
legend.position = "bottom")
summary(lmer(var_resid_pct ~ factor(sample_size) * overall_difference + (1|dv) + (1|iteration), rel_df_sample_size))
Linear mixed model fit by REML ['lmerMod']
Formula: var_resid_pct ~ factor(sample_size) * overall_difference + (1 |
dv) + (1 | iteration)
Data: rel_df_sample_size
REML criterion at convergence: 2942477
Scaled residuals:
Min 1Q Median 3Q Max
-4.591 -0.624 -0.037 0.560 8.251
Random effects:
Groups Name Variance Std.Dev.
dv (Intercept) 39.93 6.319
iteration (Intercept) 0.44 0.663
Residual 92.57 9.621
Number of obs: 399034, groups: dv, 499; iteration, 100
Fixed effects:
Estimate Std. Error
(Intercept) 22.8548 0.5443
factor(sample_size)15 -0.3645 0.1153
factor(sample_size)20 -0.6816 0.1152
factor(sample_size)25 -0.8793 0.1152
factor(sample_size)50 -1.8549 0.1152
factor(sample_size)75 -2.5014 0.1152
factor(sample_size)100 -3.0273 0.1152
factor(sample_size)125 -3.5482 0.1152
overall_differencecontrast 9.5866 0.8124
overall_differencecondition 3.1707 0.6758
factor(sample_size)15:overall_differencecontrast 0.3080 0.1731
factor(sample_size)20:overall_differencecontrast 0.6197 0.1731
factor(sample_size)25:overall_differencecontrast 0.7418 0.1731
factor(sample_size)50:overall_differencecontrast 1.4046 0.1730
factor(sample_size)75:overall_differencecontrast 1.8407 0.1730
factor(sample_size)100:overall_differencecontrast 2.0673 0.1730
factor(sample_size)125:overall_differencecontrast 2.5810 0.1730
factor(sample_size)15:overall_differencecondition 0.0784 0.1440
factor(sample_size)20:overall_differencecondition 0.0268 0.1440
factor(sample_size)25:overall_differencecondition 0.0313 0.1440
factor(sample_size)50:overall_differencecondition 0.2082 0.1440
factor(sample_size)75:overall_differencecondition 0.2141 0.1440
factor(sample_size)100:overall_differencecondition 0.0988 0.1440
factor(sample_size)125:overall_differencecondition 0.1573 0.1440
t value
(Intercept) 41.99
factor(sample_size)15 -3.16
factor(sample_size)20 -5.92
factor(sample_size)25 -7.63
factor(sample_size)50 -16.10
factor(sample_size)75 -21.72
factor(sample_size)100 -26.28
factor(sample_size)125 -30.81
overall_differencecontrast 11.80
overall_differencecondition 4.69
factor(sample_size)15:overall_differencecontrast 1.78
factor(sample_size)20:overall_differencecontrast 3.58
factor(sample_size)25:overall_differencecontrast 4.29
factor(sample_size)50:overall_differencecontrast 8.12
factor(sample_size)75:overall_differencecontrast 10.64
factor(sample_size)100:overall_differencecontrast 11.95
factor(sample_size)125:overall_differencecontrast 14.92
factor(sample_size)15:overall_differencecondition 0.54
factor(sample_size)20:overall_differencecondition 0.19
factor(sample_size)25:overall_differencecondition 0.22
factor(sample_size)50:overall_differencecondition 1.45
factor(sample_size)75:overall_differencecondition 1.49
factor(sample_size)100:overall_differencecondition 0.69
factor(sample_size)125:overall_differencecondition 1.09
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Conclusion: Larger samples are better for reliability but not necessarily always for the same reasons; for some variables this is due to increasing between subjects variance while for others it’s due to decreasing residual variance (?).
rm(rel_df_sample_size, rel_df_sample_size_summary)